Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters

نویسندگان

Christian Feichtinger

Johannes Habich

Harald Köstler

Ulrich Rüde

Takayuki Aoki

چکیده

Computational fluid dynamic simulations are in general very compute intensive. Only by parallel simulations on modern supercomputers the computational demands of complex simulation tasks can be satisfied. Facing these computational demands GPUs offer high performance, as they provide the high floating point performance and memory to processor chip bandwidth. To successfully utilize GPU clusters for the daily business of a large community, usable software frameworks must be established on these clusters. The development of such software frameworks is only feasible with maintainable software designs that consider performance as a design objective right from the start. For this work we extend the software design concepts to achieve more efficient and highly scalable multi-GPU parallelization within our software framework waLBerla for multiphysics simulations centered around the lattice Boltzmann method. Our software designs now also support a pure-MPI and a hybrid parallelization approach capable of heterogeneous simulations using CPUs and GPUs in parallel. For the first time weak and strong scaling performance results obtained on the Tsubame 2.0 cluster for more than 1000 GPUs are presented using waLBerla. With the help of a new communication model the parallel efficiency of our implementation is investigated and analyzed in a detailed and structured performance analysis. The suitability of the waLBerla framework for production runs on large GPU clusters is demonstrated. As one possible application we show results of strong scaling experiments for flows through a porous medium.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing...

متن کامل

A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

Heterogeneous clusters are a widely utilized class of supercomputers assembled from different types of computing devices, for instance CPUs and GPUs, providing a huge computational potential. Programming them in a scalable way exploiting the maximal performance introduces numerous challenges such as optimizations for different computing devices, dealing with multiple levels of parallelism, the ...

متن کامل

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

The lattice-Boltzmann method is well suited for implementation in single-instruction multiple-data (SIMD) environments provided by general purpose graphics processing units (GPGPUs). This paper discusses the integration of these GPGPU programs with OpenMP to create lattice-Boltzmann applications for multiGPU clusters. In addition to the standard single-phase single-component lattice-Boltzmann m...

متن کامل

Accelerating Solid-fluid Interaction using Lattice-boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms

We propose a numerical approach based on the Lattice-Boltzmann (LBM) and Immersed Boundary (IB) methods to tackle the problem of the interaction of solids with an incompressible fluid flow. The proposed method uses a Cartesian uniform grid that incorporates both the fluid and the solid domain. This is a very optimum and novel method to solve this problem and is a growing research topic in Compu...

متن کامل

Implementing the lattice Boltzmann model on commodity graphics hardware

Modern graphics processing units (GPUs) can perform generalpurpose computations in addition to the native specialized graphics operations. Due to the highly parallel nature of graphics processing, the GPU has evolved into a many-core coprocessor that supports high data parallelism. Its performance has been growing at a rate of squared Moore’s law, and its peak floating point performance exceeds...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Parallel Computing

دوره 46 شماره

صفحات -

تاریخ انتشار 2015

Performance modeling and analysis of heterogeneous lattice Boltzmann simulations on CPU-GPU clusters

نویسندگان

چکیده

منابع مشابه

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

A Holistic Scalable Implementation Approach of the Lattice Boltzmann Method for CPU/GPU Heterogeneous Clusters

Performance analysis of single-phase, multiphase, and multicomponent lattice-Boltzmann fluid flow simulations on GPU clusters

Accelerating Solid-fluid Interaction using Lattice-boltzmann and Immersed Boundary Coupled Simulations on Heterogeneous Platforms

Implementing the lattice Boltzmann model on commodity graphics hardware

عنوان ژورنال:

اشتراک گذاری